Optical character recognition system

ABSTRACT

An optical character recognition system is arranged to process format conditioning documents which designate the locations of data fields on data documents to be read. Those portions of the data documents outside the data fields which are designated by the conditioning document are rendered invisible to the system, thereby permitting the system to process only data which may be located in the midst of other printed matter. A fine line adjustment capability permits the system to reposition the document as necessary to locate a data field which is not located precisely as had been indicated by the format conditioning document. The system is thus automatically adaptable to any data format without requiring specialized programming. The system samples successive vertical slices of data characters being optically scanned and reassembles the slices electronically for purposes of recognition. The position of each slice is analyzed in order to predict the position, relative to the photo-sensitive array, of the next slice. This permits reliable system adaptation to a data line which is fully or partially skewed relative to its document page. Failure to recognize one or more data characters during scan of a line results in successive automatic re-scans of that line with automatic increase of detection sensitivity during each re-scan to permit characters of relatively light print to be recognized.

United States Patent [191 Holmes et al.

1 OPTICAL CHARACTER RECOGNITION SYSTEM [75] Inventors: Thomas G. Holmes, Melbourne;

Harrison B. Lidkea, Satellite Beach;

Kenneth L. Seib, Melbourne, all of Fla.

[73] Assignee: Optical Business Machines, lnc.,

Melbourne, Fla.

22 Filed: June ,7, 1973 21 Appl. No.: 367,881

[52] US. Cl 340/146.3 AH, 235/6l.6 R [51] Int. Cl. G06k 9/04 [58] Field of Search 340/1463 H, 146.3 AH, 340/172.5; 235/616 R, 61.6 H, 61.7 R, 61.11 E, 61.11 F

Primary Examiner-Gareth D. Shaw Assistant Examiner-Leo H. Boudreau (FlGS 86-88) All (FlGS.9-1S) (FIGS. 80-82) memes comnms (Fies.l3,74)

MPGNETlCTAPE GCDNIROLLER- FORMRTTER (Men. cvcLE "rmme (MEHLYIZLE \N'rsamek) (DATR 5 HOUR. Mux (ERROR FIELD- (nm'a BUFFER FULL F/F) (MEM. saacutsruNzzcsi-m) Mar. 18, 1975 [57 ABSTRACT An optical character recognition system is arranged to process format conditioning documents which designate the locations of data fields on data documents to be read. Those portions of the data documents outside the data fields which are designated by the conditioning document are rendered invisible to the system, thereby permitting the system to process only data which may be located in the midst of other'printed matter. A fine line adjustment capability permits the system to reposition the document as necessaryto locate a data field which is not located precisely as had been indicated by the format conditioning document. The system is thus automatically adaptable to any data format without requiring specialized programming. The system samples successive vertical slices of data characters being optically scanned and reassembles the slices electronically for purposes of recognition. The position of each slice is analyzed in order to predict the position, relative to the photo-sensitive array, of the next slice. This permits reliable system adaptation to a data line which is fully or partially skewed relative to its document page. Failure to recognize one or more data characters during scan of a line results in successive automatic re-scans of that line with automatic increase of detection sensitivity during each rescan to permit characters of relatively light print to be recognized.

22 Claims, 141 Drawing Figures (Furs. \e-25) 0mm 5) Data our ADDRESS IN con-moi.)

(881T ASCII) A DDR. DRTH REG.) MEM. SEQ RCH anenflzean mete) lGS.

serum. mm)

SHEET 02 0F 74 p HOTO DIODE ARRAY (so PARALLEL CHANNELS) FONT SEL' v uAmlzER d MUIJIPLEXER (MULTWLEXER CNTR) (THRESHO LD ADJUST) QCWAHwHB (me a-9] I EDEN (READ ENABLE) L 32mm. DATA 1 203 1 q FORMAT smcuns' COLUMN CDUNTERQ EiifiEl'EQER I H022 FORMAT CNTL M cunausmoom GEN .(Fle.92-96) GMED I EEADSFOQMAT PRoG. m ggg rn p ix COUN ER CNTXS. PITCH 1 3a. CTOR 204 rXIlqRirNMQ HWG-For v c l ED \NT FON g LlNE- DELETE @1403) I COCLA) T v r (was?) "205 OWE-rs 206 BEST MATCH DETECTOR f (ms 98-!03) comzzum N FUNCfl N ASCII v HE'lGHT CODE SPACE DET.

CH m2. uumoom BEST MATCHSYORE ALBLACK ENABLE \IERT. (FlGS.lO9-H8) wmooua 207 j ASC u cooe DATA CONTROL.

MEMOQY ACK (FlGS.H9*\'24) 4 IZEAD E2202 FLAG: 0252:)

OVERSTRH-(E RECOGNVHON DATQ (8 En- Ascu) (uqoz) (80m) 8 10b) saw (030'!) PATENIEB MARI 8 19. 5

mm 2605 MXR 2604 Mm 2603 

1. In an optical character recognition system of the type in which optical images of data characters to be read from data documents are converted to electronic signals for automatic processing and recognition, and wherein said system in capable of ignoring all characters on a data document other than those located in prescribed data fields, a method characterized in that data field marks are not required on said data documents to designate said prescribed data fields, said method being further characterized by a format load mode wherein a format conditioning document, which is separate and apart from data documents to be read and which contains format marks which designate the locations of said prescribed data fields, is processed to identify said prescribed data fields in each of a series of data documents to be read during a read mode, said method including the steps of: in said format load mode, detecting the locations of said format marks on said format conditioning document; in said format load mode, storing the locations of data fields designated by the format marks detected on said format conditioning document; and in said read mode, automatically and successively reading each data document in said series, said reading including the steps of: during the reading of each data document, retrieving the stored data field locations; and processing only those data characters on each data document which are located in data fields which correspond to the retrieved data field locations.
 2. An optical character recognition system of the type in which optical images of data characters to be read from data documents are converted to electronic signals for automatic processing and recognition, and wherein said system is capable of ignoring all characters on a data document other than those located in prescribed data fields, said system being characterized in that data field marks are not required on said data documents to designate said prescribed data fields, said system being further characterized by a format load mode wherein a format conditioning document, which is separate and apart from data documents to be read and which contains format marks which designate the locations of said prescribed data fields, is processed to identify said prescribed data fields in each of a series of data documents to be read during a read mode, said system comprising: means operative in said format load mode for detecting the locations of said format marks on said format conditioning document; memory means; means operative in said format load mode for storing in said memory means the locations of data fields designated by the format marks detected on said format conditioning document; and reading means operative in said read mode for automatically and successively reading each data document in said series, said reading means including: means operative during the reading of each data document for retrieving the stored data field locations from said memory means; and means for processing only those data characters of each data document which are located in data fields which correspond to the retrieved data field locations.
 3. The system according to claim 2 further characterized in that said format marks are arranged in a vertical column on said format conditioning document, each format mark designating a corresponding line on said data documents which contains data to be read and processed by said system.
 4. The system according to claim 2 further characterized in that additional format marks are provided on each of said designated lines, said additional format marks including a beginning of field of mark indicating the location on the designated line at which each data field begins, and an end of field mark indicating the location on the designated line at which each data field terminates.
 5. The system according to claim 2 further including: a transport path for said data documents; actuable stepping means for transporting said documents in discrete steps along said transport path; and a read station located in said transport path; wherein said reading means includes: optical scanner means actuable to scan across portions of documents located at said read station; an array of photosensitive elements; and an optical path for projecting scanned portions of documents at said read station onto said photosensitive elements; said system further comprising a line position analysis circuit including: means for monitoring the position of character images in said array; means for detecting when an image is not centered in said array; and means responsive to the detection of a non-centered image in said array for actuating said actuable stepping means to re-position said document until said image is centered in said array.
 6. The system according to claim 5 wherein said line position analysis circuit further includes: predicting means responsive to the position of each character image in said array for predicting the position of the next character image to be received by said array; and means responsive to the predicted position of the next character image to be received for establishing a time interval which is time coincident with the time during which the next character image will be positioned within pre-selected stages of said shift register means if that character image is positioned in said array as predicted by said predicting means.
 7. The system according to claim 6 further characterized by: means operable at regular intervals for quantizing images received at said array into respective binary signals indicating the reception and non-reception of character images at each of the array elements; multi-stage shift register means, operative between quantizing intervals, for serially shifting said binary signals in turn to accumulate signals corresponding to a plurality of successive images received at said array; and mask means for examining pre-selected stages of said shift register means to identify specific characters accumulated in said shift register means.
 8. An optical character recognition system of the type wherein a document is transported in discrete steps of equal length along a transport path containing a read station, wherein an optical scanner is actuable to scan transversely of the transport direction across a line of said document located at said read station, wherein a photo-sensitive detector receives images of characters viewed by said optical scanner and converts said images into electronic signals, and wherein recognition circuitry processes signals for the purpose of identifying characters viewed by said optical scanner, said system being characterized by a format load mode wherein it it processes a format conditioning document containing format marks which designate the location of data fields on data documents to be read by the system during a read mode, at least some of said format marks comprising vertical format marks arranged in a predetermined vertical column on said format conditioning document and located on those horizontal lines which correspond to lines on which data appears in said data documents, said system comprising: sensor means located in said transport path for determining when the leading edge of a transported document enters said read station; counter means for counting each transport step of a documenT after the leading edge of the document has entered said read station; scan control means operative in said format load mode for actuating said optical scanner to scan lines of said format conditioning document presented at said read station for the purpose of identifying vertical format marks at said recognition circuitry; a memory unit; means responsive to identification of a vertical format mark during scan of a line of said format conditioning documents for storing the count from said counter means in said memory unit; means operative in said read mode for successively retrieving said counts from said memory unit; means operative in said read mode for stepping data documents, one at a time, along said transport path and through said read station such that stepping temporarily terminates only when lines on said document corresponding to said successive retrieved counts are positioned at said read station; and wherein said scan control means is operative in said read mode, in response to positioning of a line to be read at said read station, for actuating said optical scanner to scan across said read station.
 9. The system according to claim 8 further characterized in that additional format marks are provided on each of said designated lines, said additional format marks including a beginning of field mark indicating the location on the designated line at which each data field begins, and an end of field mark indicating the location on the designated line at which each data field terminates, said system further comprising: a source of timing pulses; further counter means operative during a scan by said optical scanner for counting timing pulses, such that the count total in said further counting means corresponds to a specific location on a document line being viewed by said optical scanners; means for storing in said memory unit the line location count of each beginning of field and end of field mark identified by said recognition circuitry during said format load mode; means operative during said read mode for successively retrieving said line location counts from said memory unit; and means operative in said read mode for inhibiting character identification at said recognition circuitry except during intervals when the count at said further counter corresponds to counts between the successively retrieved beginning of field and end of field mark location counts.
 10. The system according to claim 8 wherein said photo-sensitive detector comprises an array of photo-sensitive elements onto which the image viewed by said optical scanner is projected; said system further comprising a line position analysis circuit including means operatively connected to said recognition circuitry for determining if an image from a document being scanned is vertically centered in said array, and means responsive to a determination that an image is not so centered for stepping said document as necessary to vertically center said image in said array.
 11. The system according to claim 10 wherein said line position analysis circuitry includes means operative in said format load mode for inhibiting storage of a step count corresponding to an identified vertical format mark location until said identified vertical format mark is vertically centered in said array.
 12. The system according to claim 8 further characterized in that stepping of said documents along said transport path is effected by: a step motor; a common drive cylinder having uniform circumference along its length and arranged to be rotatably stepped about its longitudinal axis by said step motor, said common drive cylinder being disposed with its longitudinal axis oriented transversely of said transport path; at least one pair of transport rollors spaced transversely across said transport path, each rollor positioned in circumferential engagement with said common drive cylinder so as to be simultaneously rotated at the same speed by said commoN drive cylinder; and at least one pair of spaced pinch rollors which are simultaneously actuable to engage respective transport rollors such that a document in said transport path is equally driven by said transport rollors.
 13. The system according to claim 12 further characterized by: a vertically positionable stacker bin for holding a stack of documents to be processed by said system; a feed member arranged to be selectively lowered into said stacker bin toward the uppermost document in said stack for the purpose of engaging said uppermost document and delivering same to said transport path, the distance over which said feed member is lowerable being limited; and means responsive to lowering of said feed member to its limit position and the absence of engagement between said feed member and said uppermost document for automatically lifting said stacker bin until said uppermost document is engaged by said feed member.
 14. The system according to claim 8 further characterized by a reject mode in which said system is operative to automatically reject any document in which one or more characters have not been recognized, said system further comprising: a normal stacker bin and a reject stacker bin, each positioned at the end of said transport path to alternatively receive documents from said transport path; and path control means operative in said reject mode and responsive to said recognition circuitry for automatically directing documents in which all characters have been recognized to said normal stacker bin and directing documents in which at least one character has not been recognized to said reject stacker bin.
 15. The system according to claim 8 further characterized by: means responsive to failure of said recognition circuitry to recognize a data character in a scanned line for automatically initiating a re-scan of the entire line by said optical scanner; and means responsive to failure of said recognition circuitry to recognize a data character during a first re-scan of a line for automatically increasing the detection sensitivity of said photo-sensitive detector and actuating said optical scanner to scan the unrecognized character.
 16. The system according to claim 8: wherein said photo-sensitive detector comprises an array of photo-sensitive elements arranged in a straight line to receive vertical slice images of characters being scanned by said optical scanner; wherein said recognition circuitry includes: means operative at regular intervals for quantizing slice images received at said array into respective binary signals indicating the reception and non-reception of character images at individual array elements; a plurality of shift register means, operative between quantizing intervals, for serially shifting said binary signals through all of said shift register means in turn to accumulate signals corresponding to a plurality of successive slice images received at said array; and a plurality of mask means for examining preselected stages of said plurality of shift register means, each mask means identifying a specific character accumulated in said shift register means.
 17. The system according to claim 16 further comprising a line position analysis circuit arranged to receive said serially shifted binary signals in order to determine the position of individual received character slices relative to the center of said array, said line position analysis circuit further comprising means responsive to the position of each received character slice in said array for predicting the position of the next received character slice in said array.
 18. The system according to claim 17 further comprising means responsive to the predicted position of the next received character slice for generating a signal having a time interval which is time coincident with the time during which the next character slice is positioned within said preselected stages of said shift register means if that character slice is positioned in said array as predicted by said line position analysis circuit.
 19. An optical character recognition system of the type wherein a document is transported along a transport path containing a read station, wherein an optical scanner is actuable to scan transversely of the transport direction across a line of said document located at said read station, wherein a photo-sensitive detector receives images of characters viewed by said optical scanner and converts said images into electronic signals, said photo-sensitive detector comprising an array of photo-sensitive elements arranged in a straight line to receive vertical slice images of characters viewed by said optical scanner, and wherein recognition circuitry processes said signals for the purpose of identifying characters viewed by said optical scanner, said system being characterized in that the vertical position of successive character slices in said array is automatically predicted on the basis of the vertical position of an immediately preceding slice in said array, wherein said recognition circuitry includes: means operative at regular intervals for quantizing slice images received at said array into respective binary signals indicating the reception and non-reception of character images at the individual array elements; a plurality of shift registers, operative between quantizing intervals, for serially shifting said binary signals through all of said shift registers in turn to accumulate signals corresponding to a plurality of successive slice images received at said array; and a plurality of mask means for examining pre-selected stages of said plurality of shift registers, each mask means identifying a specific character accumulated in said shift register means; said system further comprising a line position analysis circuit arranged to receive said serially shifted binary signals in order to determine the position of individual received character slices relative to the center of said array, said line position analysis circuit further comprising means responsive to the position of each received character slice relative to said array for predicting the position of the next received character slice.
 20. The system according to claim 19 further comprising means responsive to the predicted position of the next received character slice for generating a signal having a time interval which is time coincident with the time during which the next character slice is positioned within said preselected stages of said shift register means if that character slice is positioned in said array as predicted by said line position analysis circuit.
 21. In an optical character recognition system of the type wherein a document is transported in discrete steps of equal length along a transport path containing a read station, wherein an optical scanner is actuable to scan transversely of the transport direction across a line of said document located at said read station, wherein a photo-sensitive detector receives images of characters viewed by said optical scanner and converts said images into electronic signals, and wherein recognition circuitry processes said signals for the purpose of identifying characters viewed by said optical scanner, a method of controlling said system to read data located only in predetermined data fields on data documents, said method being characterized by a format load mode for said system wherein the system processes a format conditioning document containing format marks which designate the location of data fields on data documents to be read by the system during a read mode, at least some of said format marks comprising vertical format marks arranged in a predetermined vertical column and located on those horizontal lines which correspond to lines on which data appears on said data documents, said method comprising the steps of: detecting when the leading edge of a transported document enters said read station; counting each transport sTep of a document after the leading edge of the document has entered said read station; actuating said optical scanner to scan lines of said format conditioning document presented at said read station for the purpose of identifying vertical format marks at said recognition circuitry; storing the transport step count corresponding to the line in which a vertical format mark is identified; during the read mode, successively retrieving said transport step counts from said memory unit; during said read mode, stepping data documents along said transport path and through said read station such that stepping temporarily terminates only when lines on said document corresponding to said successive retrieved counts are at said read station; and actuating said optical scanner to scan across said read station in response to positioning of a line to be read at said read station.
 22. The method according to claim 21 further characterized in that additional format marks are provided on each of said designated lines, said additional format marks including a beginning of field mark indicating the location of the designated line at which each designated field begins, and an end of field mark indicating the location on the designated line at which each data field terminates, said method comprising the additional steps of: generating a series of timing pulses; counting timing pulses during a scan by said optical scanner such that the timing pulse count corresponds to a specific location on a document being viewed by said optical scanner; 242 storing the timing pulse count corresponding to the line location of each beginning of field and end of field mark identified by said recognition circuitry during said format load mode; during said read mode, successively retrieving the timing pulse count corresponding to said line locations of said beginning and end of field marks; and inhibiting character identification by said recognition circuitry during said read mode except during intervals when the current timing pulse count corresponds to a line location being viewed by said optical scanner which is part of a data field designated by the retrieved beginning and end of field line locations. 